Using programming to get stories

Louis Goddard

Who am I?

Work

  • Did a PhD in English from 2013 and freelanced

  • Data journalist at The Times and Sunday Times since 2016

  • Data advisor at Global Witness from next month

Global Witness logo

PhD work

Electric cars Sugar daddies Child abuse Passwords Trains Church land Oxbridge places Ski doping Consultations Corbyn Antidepressants Gagging clauses 1 US campaign finance Gagging clauses 2 School funding Tax 1 Tax 2

Interests

  • Cyber-crime and the dark net

  • Transparency and open data

  • Bringing innovative techniques to data journalism

Tools

  • R programming language

  • Tidyverse family of packages

  • Elasticsearch and Kibana

R logo

How can programming help us get stories out of data?

The three As

  • Access

  • Amalgamation (OK, this one’s not great)

  • Analysis

Access

  • People talk a lot about creating new data

  • Another way to think of it: accessing hidden data

  • Examples: web scraping, using APIs, getting data from PDFs and Word documents, working with data bigger than Excel can handle

Amalgamation

  • 2 + 2 = 5

  • When combined, data is more than the sum of its parts

  • Examples: joins and fuzzy matching, working with weird file formats, tidying data

Analysis

  • How can programming help us see stories?

  • Visualisation is important, but not the be all and end all

  • Examples: geospatial analysis, times series analysis, statistical analysis, search

Access

Web scraping

  • Getting information from a website into structured form

  • 90% of scraping jobs for stories follow this format:

    1. Page through an index to get links to individual items
    2. Load each item and extract some data
    3. Combine the results

Consultations

Consultations screen 1

Consultations screen 2

Consultations screen 3

Consultations screen 4

Web scraping

  • Key tools: rvest and SelectorGadget
##   [1] "Melanie Phillips"                                                                                 
##   [2] "Brain fitness"                                                                                    
##   [3] "New challenge"                                                                                    
##   [4] "The Daily Quiz"                                                                                   
##   [5] "Blow for May as backstop risk ‘unchanged’"                                                        
##   [6] "Second heavy defeat almost certain"                                                               
##   [7] "May’s new Brexit deal and what it means for the backstop"                                         
##   [8] "Fifth letter bomb has yet to be found, claims ‘IRA’"                                              
##   [9] "Ten countries ground 737s as safety fears grow"                                                   
##  [10] "Pilots fear system is flawed"                                                                     
##  [11] "Air pollution ‘kills more people than smoking’"                                                   
##  [12] "Dirty air causes harm from cradle to the grave"                                                   
##  [13] "Great British walk that’s the peak of perfection"                                                 
##  [14] "Spread out the maps, it’s spring at last"                                                         
##  [15] "Gangs ‘give pupils knives to get them thrown out of school’"                                      
##  [16] "One in five women killed by their partners had contacted the police"                              
##  [17] "MI5: Thatcher shielded MP accused of child abuse"                                                 
##  [18] "Body found in search for British backpacker"                                                      
##  [19] "BBC under investigation over pay discrimination"                                                  
##  [20] "TV chef sues over gastropub revamp"                                                               
##  [21] "Storm Gareth bringing 80mph winds"                                                                
##  [22] "Housing developer forced to ditch ‘wicked’ tree nets"                                             
##  [23] "May claims victory in Brexit backstop talks"                                                      
##  [24] "PM dashes to Strasbourg on a wing and a prayer to save Brexit"                                    
##  [25] "EU planning for year-long extension, diplomats say"                                               
##  [26] "Raab leadership bid opens with pledge on social mobility"                                         
##  [27] "PM takes flight to be far from the maddest crowd"                                                 
##  [28] "She’s gone and done it - but will it be enough?"                                                  
##  [29] "Cabinet row over cash to stop wave of knife crime"                                                
##  [30] "Rebel group draws a third of Labour MPs"                                                          
##  [31] "Family of Isis bride pleads with Javid to show mercy"                                             
##  [32] "More abuse, fewer arrests: the figures from stretched police"                                     
##  [33] "A sixth of London homicides are linked to violence against women"                                 
##  [34] "Victim calls for register of abusers"                                                             
##  [35] "New approach stretches beyond crime to culture"                                                   
##  [36] "Patients face longer A&E waits as targets scrapped, doctors warn"                                 
##  [37] "More work needed to convince patients"                                                            
##  [38] "Captain Marvel fights off online trolls with $455m opening weekend"                               
##  [39] "Nun’s letter shakes up history of Lisbon earthquake"                                              
##  [40] "Honour for British UN worker killed in Boeing crash"                                              
##  [41] "Mystery of what happened on flight 302 could soon be revealed"                                    
##  [42] "China bans use of jet amid fears over safety"                                                     
##  [43] "Lucky passenger was turned away at gate"                                                          
##  [44] "Stabiliser may have tipped fatal flight out of pilots’ control"                                   
##  [45] "Marine who lost leg rows Atlantic in record time"                                                 
##  [46] "Solar storms could cripple modern life"                                                           
##  [47] "For perfect family TV, bring back The Bill"                                                       
##  [48] "Alexa, popcorn and bakeware enter the great British basket"                                       
##  [49] "School offers PPE course to inspire girls into politics"                                          
##  [50] "Eye test could spot Alzheimer’s earlier"                                                          
##  [51] "Digital ‘friends’ help students get to lessons"                                                   
##  [52] "Sins of the (Time) Lord"                                                                          
##  [53] "In pictures: Commonwealth Day"                                                                    
##  [54] "Heathrow extension ‘will be the size of another Gatwick’"                                         
##  [55] "Policewoman took £34,000 in gifts from lonely widower, 87"                                        
##  [56] "Tears of teenage boy in court over Jodie stabbing"                                                
##  [57] "Fan jailed for punching player on pitch"                                                          
##  [58] "Who likes the Independent Group and where do they stand on policy?"                               
##  [59] "I’ll vote against this deal and face Brexit voters at the ballot box"                             
##  [60] "May has one last escape route from this sorry saga"                                               
##  [61] "Hammond has a golden chance to build for the homeless"                                            
##  [62] "News in pictures"                                                                                 
##  [63] "EU waits to see winner of this power struggle"                                                    
##  [64] "No Phoebe, violence doesn’t empower women"                                                        
##  [65] "Zuckerberg’s view of privacy is self-serving"                                                     
##  [66] "Advice for a soft lad from his big lad friend"                                                    
##  [67] "Face-offs that start with a peach can end with a brick"                                           
##  [68] "Four-hour target for A&E patients has had its day"                                                
##  [69] "Avoidable Damage"                                                                                 
##  [70] "Flight Hazards"                                                                                   
##  [71] "Chronicler of Catastrophe"                                                                        
##  [72] "Call for tighter control of motorbike emissions"                                                  
##  [73] "Nature notes"                                                                                     
##  [74] "Birthdays today"                                                                                  
##  [75] "New Assad statue provokes wave of protests"                                                       
##  [76] "Iraq agrees to take back all 20,000 of its jihadis"                                               
##  [77] "US set to eclipse Saudi Arabia as world’s biggest oil exporter"                                   
##  [78] "President of Algeria abandons re‑election bid"                                                    
##  [79] "Syria’s fate in hands of world leaders jockeying for power"                                       
##  [80] "Guaidó pleads for foreign help to solve power crisis"                                             
##  [81] "German gnome industry on its knees"                                                               
##  [82] "Gardener targeted clients from beyond grave"                                                      
##  [83] "Democrats threaten Trump border wall"                                                             
##  [84] "Fundraiser sold access to White House"                                                            
##  [85] "Secret burial for Göring’s daughter, 80"                                                          
##  [86] "Five Star backs down in vaccination row"                                                          
##  [87] "King with 14 wives loses his second queen in a year"                                              
##  [88] "28 years on, Ethiopian ‘killers’ still in embassy"                                                
##  [89] "Woman accused of poisoning Kim’s half brother is freed"                                           
##  [90] "Indian poll row as Modi uses captured pilot in campaigning"                                       
##  [91] "Salvini rejects Saudi backing for La Scala"                                                       
##  [92] "Traditional cheesemakers kick up a stink with MPs"                                                
##  [93] "Man scoops $273m lottery on ticket he left at the till"                                           
##  [94] "Turkey says speculation led to slump"                                                             
##  [95] "Chechen activist faces penal colony after ‘fake’ drugs bust"                                      
##  [96] "The ‘happy house’ that is a misery for sellers"                                                   
##  [97] "Cox’s view of backstop concessions hits pound"                                                    
##  [98] "UK economy rebounds in January"                                                                   
##  [99] "You can be better off outside EU, says man tipped to replace Carney"                              
## [100] "Feathers fly after RSPB shoots down Kestrel sponsorship"                                          
## [101] "Ryanair will ban Britons from buying shares"                                                      
## [102] "More powerful regulator to replace under-fire FRC"                                                
## [103] "Someone will game even the fairest pay rules . . . you can bank on that"                          
## [104] "Barrick’s tale of the unexpected"                                                                 
## [105] "Suitors show interest in G4S’s cash-handling business"                                            
## [106] "Resilient economy boosts banks, housebuilders and retailers"                                      
## [107] "888 Holdings spreads its bets"                                                                    
## [108] "Doubts expressed over Gilbert’s role at Revolut"                                                  
## [109] "Battle ends with gold miners on same side"                                                        
## [110] "British Land chairman is heading for the door"                                                    
## [111] "Van hire boss faces crunch meeting after investor’s move"                                         
## [112] "Corporate confidence falls to new low"                                                            
## [113] "Financial sector ‘to start cutting jobs’"                                                         
## [114] "Jeans maker Levi fashions a return to the market"                                                 
## [115] "Ex-Superdry chief not welcome back, bosses tell investors"                                        
## [116] "Accounting error increases Kier’s debt"                                                           
## [117] "No-deal is a stepping stone to breaking free from too many tariffs"                               
## [118] "Sterling rises on May deal"                                                                       
## [119] "Central banks need to ease up on QE, warns the man who anticipated crisis"                        
## [120] "Igas turns up heat on shale gas exploration"                                                      
## [121] "WPP signs Microsoft chief to help battle with online giants"                                      
## [122] "Cairn Energy still waiting for Indian tax row ruling"                                             
## [123] "Law firm breaks new ground with £366m flotation"                                                  
## [124] "Clarkson sinks after warning of gloom ahead"                                                      
## [125] "Unilever chief paid €11.7m despite failed restructuring"                                          
## [126] "Provident rubbishes rival’s ‘misleading’ takeover bid"                                            
## [127] "Interserve lenders ready to sweeten rescue deal"                                                  
## [128] "Stout approach is not about to change"                                                            
## [129] "HMV rescue hits flat note for creditors"                                                          
## [130] "Bank puts two former army colleagues in the front line"                                           
## [131] "Companies should be allowed to choose how best to deliver justice"                                
## [132] "Your three-minute digest"                                                                         
## [133] "The beauty of manufacturing and exporting is more than skin deep"                                 
## [134] "Buveur D’Air and Apple’s Jade ready to serve up classic"                                          
## [135] "Champion tipster Rob Wright’s best bets for day one of Cheltenham"                                
## [136] "Conte payoff row with Chelsea escalates"                                                          
## [137] "Zidane faces a completely different test in second coming: the rebuild of Real"                   
## [138] "Public will inflict proper punishment, not courts"                                                
## [139] "Grealish attacker jailed for 14 weeks as FA seeks talks over player safety"                       
## [140] "‘I couldn’t sleep for days – I’ve reached out to Jack’"                                           
## [141] "Lions should give Gatland the job now"                                                            
## [142] "Times Sport Unseen: the best of our photographers’ pictures this week"                            
## [143] "Collective punishment will not solve crowd trouble"                                               
## [144] "How is security at football grounds supposed to work?"                                            
## [145] "City still teenagers in Europe even if we win it, says Guardiola"                                 
## [146] "Nine months on, Zidane is back to rescue ailing Real"                                             
## [147] "Art meets science: how a horse jumps"                                                             
## [148] "From birth to Cheltenham: the making of a contender"                                              
## [149] "Rising stars, party bars and the year of women jockeys"                                           
## [150] "Focus on safety like never before"                                                                
## [151] "The big questions this week"                                                                      
## [152] "The Game Dissected: how Pérez is spearheading Newcastle’s survival bid"                           
## [153] "The key owners to follow at the Cheltenham Festival"                                              
## [154] "Times Sport Dissects: How to win the Gold Cup"                                                    
## [155] "Van Dijk: we need to find rare resilience"                                                        
## [156] "Gündogan’s exit hint gives City dilemma"                                                          
## [157] "‘I used to be spiteful but I have grown soft. A bit of me has gone’"                              
## [158] "England pile on pressure as Wales bid for grand slam"                                             
## [159] "Top European clubs fight Nations League"                                                          
## [160] "Heavy rollers and flatter seams aimed at giving batsmen a fair crack"                             
## [161] "Bayliss gives Archer no guarantees"                                                               
## [162] "Picasso of the baize finding new art forms"                                                       
## [163] "Pakistan demand action on India caps"                                                             
## [164] "Trippier faces battle to be fit for England squad"                                                
## [165] "City to compensate Bennell abuse victims"                                                         
## [166] "Silva charged over referee rant"                                                                  
## [167] "Bolton wages still unpaid"                                                                        
## [168] "Olympic medallist Ogogo forced to retire"                                                         
## [169] "Djokovic ‘snubbed’ Federer over Kermode contract meeting"                                         
## [170] "Champion tipster of the year Rob Wright’s racing tips"                                            
## [171] "A master at work: Britain’s greatest sports photographer’s best pictures"                         
## [172] "Wallace Broecker"                                                                                 
## [173] "Tony Pike"                                                                                        
## [174] "Vanda Salmon"                                                                                     
## [175] "Lives remembered"                                                                                 
## [176] "March 11"                                                                                         
## [177] "Tempestuous weather during ‘clanging arch of steel-grey March’"                                   
## [178] "The disarmament of Germany"                                                                       
## [179] "Crossword Club"                                                                                   
## [180] "Times Concise No 7910"                                                                            
## [181] "Times Quick Cryptic No 1306"                                                                      
## [182] "Times Cryptic No 27296"                                                                           
## [183] "Concise Quintagram No 321"                                                                        
## [184] "Cryptic Quintagram No 321"                                                                        
## [185] "Sudoku No 10558 Super fiendish"                                                                   
## [186] "Sudoku No 10557 Difficult"                                                                        
## [187] "Sudoku No 10556 Mild"                                                                             
## [188] "Killer Sudoku No 6480 Tough"                                                                      
## [189] "Killer Sudoku No 6479 Moderate"                                                                   
## [190] "Brain Trainer No 2825"                                                                            
## [191] "Cell Blocks No 3477"                                                                              
## [192] "Codeword No 3594"                                                                                 
## [193] "Futoshiki No 3387"                                                                                
## [194] "Kakuro No 2346"                                                                                   
## [195] "KenKen No 4586"                                                                                   
## [196] "Lexica No 4694"                                                                                   
## [197] "Lexica No 4693"                                                                                   
## [198] "Polygon"                                                                                          
## [199] "Set Square No 2349"                                                                               
## [200] "Suko No 2495"                                                                                     
## [201] "Bridge"                                                                                           
## [202] "Chess"                                                                                            
## [203] "Age-proof your brain"                                                                             
## [204] "Angie Thomas — the new queen of teen fiction — on race, poverty, TV comedy and Harry Potter"      
## [205] "Do 1 in 7 students really cheat?"                                                                 
## [206] "Dr Mark Porter: Blood pressure treatment is on the rise. Here’s why"                              
## [207] "Feeling the burn? Here’s what you can do about acid reflux"                                       
## [208] "Three ways to prepare for the hay fever season"                                                   
## [209] "Food fight: cashews v almonds"                                                                    
## [210] "Robert Crampton: You say ‘I’m a free spirit’, but you’re just trying to avoid taking the bins out"
## [211] "J-Lo and A-Rod"                                                                                   
## [212] "Victoria at the Grand Theatre, Leeds"                                                             
## [213] "The Hold Steady at the Electric Ballroom, NW1"                                                    
## [214] "Lizz Wright at the Queen Elizabeth Hall"                                                          
## [215] "London Symphony Orchestra/ Haitink at the Barbican"                                               
## [216] "National Dance Company Wales at the Linbury, Royal Opera House"                                   
## [217] "The Times Daily Quiz"                                                                             
## [218] "TV review: Cheat; The Choir — Our School by the Tower"                                            
## [219] "What’s on TV tonight"                                                                             
## [220] "Lindsey Bareham’s mussels with bacon"                                                             
## [221] "Captain Marvel"                                                                                   
## [222] "Disinfectants ‘fuelling rise of MRSA on patient wards’"                                           
## [223] "‘Final justice’ for victims as serial killer dies in jail"                                        
## [224] "Rail union threatens carnage if members denied drivers’ deal"                                     
## [225] "Observatory razed in fire to be rebuilt"                                                          
## [226] "Antisemitism row as ex-MP shares offensive cartoon"                                               
## [227] "Calls grow for ban on sex-for-rent ads"                                                           
## [228] "Labour offers no more than gesture politics"                                                      
## [229] "New channel has replaced that whiff of the kailyard"                                              
## [230] "Jim Arnold"                                                                                       
## [231] "Lack of choice in school subjects hits job prospects"                                             
## [232] "Forgive me, begs man jailed for deliberately spreading HIV"                                       
## [233] "Politicians are wildly out of touch on smacking ban"                                              
## [234] "Academic ‘said he would help student if he could spank him’"                                      
## [235] "Scots prefer independence to no-deal Brexit or May’s plan"                                        
## [236] "Cheating in exams on rise"                                                                        
## [237] "13-year-old boy stabbed during school lunch break"                                                
## [238] "Confidence among Scottish companies reaches new low"                                              
## [239] "Scots hit by McGregor’s retirement"                                                               
## [240] "Morelos told not to hold back tonight"                                                            
## [241] "Jack: abuse from fans in Aberdeen is a compliment"                                                
## [242] "McInnes: we haven’t missed our chance"                                                            
## [243] "Kilmarnock winning in the rain"                                                                   
## [244] "Russell: we’re still trying to see positives"                                                     
## [245] "Brexit backstop fears can be put to bed, Varadkar insists"                                        
## [246] "Boeing crash victim ‘wanted to save the world’"                                                   
## [247] "UN warned it would raise alarm over princess’s fate"                                              
## [248] "Young homeowners would be hardest hit by ECB rate rise"                                           
## [249] "‘Brave’ cervical cancer campaigner helps to boost HPV vaccine uptake"                             
## [250] "Nurses head back to Labour Court in row over contract"                                            
## [251] "Beast O’Driscoll ‘killed in street after IRA dispute’"                                            
## [252] "Alleged Omagh bombers declared bankrupt"                                                          
## [253] "Farmer accused of murder asked gardaí how DJ died"                                                
## [254] "General feels ‘sympathy’ after Ballymurphy killings"                                              
## [255] "NUI professor on a mission to explore the Indian Ocean"                                           
## [256] "Isis bride will not be stripped of citizenship, says Varadkar"                                    
## [257] "IRA victims condemn honour for McGuinness"                                                        
## [258] "Online plan to help clarify payments for child welfare"                                           
## [259] "Children’s trains halted as insurance crisis takes toll on small businesses"                      
## [260] "‘Damaging gusts’ expected as Storm Gareth blows in"                                               
## [261] "90% of health and school contracts going over budget"                                             
## [262] "Store raider used trollies to halt gardaí"                                                        
## [263] "Sinn Féin to reject higher carbon tax"                                                            
## [264] "Westminster ‘should end’ Northern Ireland abortion ban"                                           
## [265] "Mullein arrive with unmistakable clumps of grey-green, felted leaves"                             
## [266] "Macron’s vision for Europe has a blind spot"                                                      
## [267] "Crying wolf about Brexit has made us fail to prepare"                                             
## [268] "Ireland can no longer ignore Islamist threat"                                                     
## [269] "The Wrong Home"                                                                                   
## [270] "Irish mortgage rates will be expensive ‘for years to come’"                                       
## [271] "Dublin the big winner of ‘Brexodus’"                                                              
## [272] "Short-term notes set to boost coffers"                                                            
## [273] "Software firm in Canadian breakthrough"                                                           
## [274] "Schmidt: we need to maintain top level to win"                                                    
## [275] "Pro14 to carry on during the World Cup"                                                           
## [276] "Ireland still face some big questions"                                                            
## [277] "How Schmidt’s arrival signalled the start of France’s worst nightmare"                            
## [278] "Veterans are crucial to giving Wales a run for their money"                                       
## [279] "Sayeh still breaking down barriers after escape from Liberia"                                     
## [280] "Division 1 proving that there really is no place like home"                                       
## [281] "Intercounty demands leave players with no life – Scallan"

APIs

  • API: application programming interface

  • Structured way for programs to talk to each other

  • Lots of organisations provide them: private sector and public sector

Gagging clauses 1

APIs

https://www.contractsfinder.service.gov.uk/Published/Notices/OCDS/Search?stages=award&order=ASC&page=1

Open contracting API data

Electric cars

APIs

Live demo! https://www.zap-map.com/live/